Skip to content

feature about chat completions reasoning, support gemini-3-pro thinking and support claude model enable thinking and interleaved thinking#163

Closed
caozhiyuan wants to merge 17 commits into
ericc-ch:masterfrom
caozhiyuan:feature/chat-completions-reasoning
Closed

feature about chat completions reasoning, support gemini-3-pro thinking and support claude model enable thinking and interleaved thinking#163
caozhiyuan wants to merge 17 commits into
ericc-ch:masterfrom
caozhiyuan:feature/chat-completions-reasoning

Conversation

@caozhiyuan
Copy link
Copy Markdown
Contributor

@caozhiyuan caozhiyuan commented Dec 31, 2025

This pull request introduces significant improvements to how "thinking blocks" are handled and translated between Anthropic and OpenAI message formats. The changes ensure that reasoning and signatures are preserved during translation, add support for model-specific thinking budgets, and update protocol reminders for Claude models. Additionally, there are updates to API versioning and header intent values.

Thinking block and reasoning support:

  • Added signature to AnthropicThinkingBlock and support for thinkingBlockOpen in AnthropicStreamState to track reasoning blocks in streaming and non-streaming message translations. [1] [2] [3]
  • Updated translation logic to preserve reasoning_text and reasoning_opaque (signature) when converting between Anthropic and OpenAI formats, including new functions for extracting and injecting thinking blocks. [1] [2] [3] [4] [5]

Model-specific protocol and budget enforcement:

  • Added logic to inject system reminders and enforce interleaved thinking protocol for Claude models, and to calculate and pass a thinking_budget based on model capabilities. [1] [2]

API and header updates:

  • Updated Copilot and API version numbers, and changed openai-intent header from conversation-panel to conversation-agent for requests. [1] [2]

Token counting improvements:

  • Excluded reasoning_opaque from token counting when calculating message tokens to avoid miscounting opaque reasoning signatures.

Streaming translation refactor:

  • Refactored streaming translation logic to support thinking block handling and reasoning extraction, improving event sequencing for message starts, content, tool calls, and finish events. [1] [2] [3]

These updates collectively enhance the fidelity and protocol compliance of message translations between Anthropic and OpenAI, especially for Claude models.

Copilot AI review requested due to automatic review settings December 31, 2025 07:29
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds comprehensive support for reasoning/thinking blocks in the translation layer between OpenAI/Copilot and Anthropic message formats, with a focus on enabling "interleaved thinking" for Claude models. The changes include significant refactoring of the streaming and non-streaming translation logic, API versioning updates, and infrastructure improvements.

Key changes:

  • Added signature field to thinking blocks and implemented bidirectional translation of reasoning content between OpenAI and Anthropic formats
  • Introduced thinking budget calculation with min/max constraints and automatic injection of Claude-specific system prompts for interleaved thinking workflows
  • Refactored streaming translation into separate handler functions for improved maintainability and added state tracking for thinking blocks

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
src/lib/api-config.ts Updated Copilot API version to 2025-10-01, version to 0.35.0, changed openai-intent header, and refactored base URL logic to remove conditional for individual accounts
src/lib/tokenizer.ts Added logic to skip reasoning_opaque field when calculating token counts
src/routes/messages/anthropic-types.ts Added signature field to AnthropicThinkingBlock and thinkingBlockOpen state to AnthropicStreamState
src/routes/messages/handler.ts Initialized thinkingBlockOpen field in streaming state
src/routes/messages/non-stream-translation.ts Implemented thinking budget calculation, Claude-specific system prompt injection for interleaved thinking, enhanced assistant message handling to extract and filter thinking blocks with signatures, and updated response translation to include thinking blocks
src/routes/messages/stream-translation.ts Major refactoring to separate concerns into handleMessageStart, handleThinkingText, handleContent, handleToolCalls, and handleFinish functions; added logic to handle reasoning_text and reasoning_opaque in streaming responses with proper thinking block state management
src/services/copilot/create-chat-completions.ts Added reasoning_text, reasoning_opaque fields to Delta, ResponseMessage, and Message interfaces; added thinking_budget to ChatCompletionsPayload; exported Delta and Choice interfaces
src/services/copilot/get-models.ts Added max_thinking_budget and min_thinking_budget fields to ModelSupports interface
src/services/get-vscode-version.ts Updated fallback VSCode version from 1.104.3 to 1.107.0
src/start.ts Added idleTimeout: 0 configuration to prevent server timeout during idle periods
tests/anthropic-request.test.ts Updated tests to include signature field in thinking blocks and verify reasoning_text instead of content for thinking content
tests/anthropic-response.test.ts Added thinkingBlockOpen field initialization to streaming state in tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +348 to +361
events.push(
{
type: "content_block_delta",
index: state.contentBlockIndex,
delta: {
type: "signature_delta",
signature: "",
},
},
{
type: "content_block_stop",
index: state.contentBlockIndex,
},
)
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature_delta event is emitted with an empty string when closing a thinking block. According to Anthropic's streaming protocol, an empty signature may not be a valid value for signature_delta events. Consider either omitting this event entirely or using a proper signature value. If an empty signature is intentional for Claude models, add a comment explaining this behavior.

Suggested change
events.push(
{
type: "content_block_delta",
index: state.contentBlockIndex,
delta: {
type: "signature_delta",
signature: "",
},
},
{
type: "content_block_stop",
index: state.contentBlockIndex,
},
)
events.push({
type: "content_block_stop",
index: state.contentBlockIndex,
})

Copilot uses AI. Check for mistakes.

allTextBlocks.push(...textBlocks)
allToolUseBlocks.push(...toolUseBlocks)
assistantContentBlocks.push(...thinkBlocks, ...textBlocks, ...toolUseBlocks)
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thinking blocks are placed before text blocks in the response (line 362), which means the thinking content will always appear first in the response regardless of its original position. This may not accurately represent the interleaved thinking flow if text was generated before thinking or if there were multiple rounds of thinking and text. Consider tracking the original order of blocks or documenting why thinking blocks must always come first in the response.

Suggested change
assistantContentBlocks.push(...thinkBlocks, ...textBlocks, ...toolUseBlocks)
assistantContentBlocks.push(...textBlocks, ...thinkBlocks, ...toolUseBlocks)

Copilot uses AI. Check for mistakes.
})
state.contentBlockOpen = false
state.contentBlockIndex++
if (!toolBlockOpen) {
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the handleFinish function, when a content block is open and needs to be closed before finishing, the function calls handleReasoningOpaque only if the tool block is not open (line 67-69). However, this logic doesn't consider whether there's actually reasoning_opaque data in the delta. If choice.delta.reasoning_opaque is empty or undefined, handleReasoningOpaque will not emit any events, which is correct, but the conditional check creates unnecessary coupling. Consider adding a guard in handleReasoningOpaque itself or passing the delta more explicitly.

Suggested change
if (!toolBlockOpen) {
if (!toolBlockOpen && choice.delta?.reasoning_opaque) {

Copilot uses AI. Check for mistakes.
content:
"<system-reminder>Please strictly follow Interleaved thinking</system-reminder>",
} as Message
return [...systemMessages, thinkingMessage, ...otherMessages]
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The system-reminder message is placed immediately after system messages and before all other messages. This placement may break the expected message order when there are existing user/assistant message exchanges. The reminder should ideally be placed at the end of the messages array to avoid disrupting the conversation flow, or inserted more strategically based on the context. Consider moving it to the end or documenting why this specific placement is required.

Suggested change
return [...systemMessages, thinkingMessage, ...otherMessages]
return [...systemMessages, ...otherMessages, thinkingMessage]

Copilot uses AI. Check for mistakes.
Comment on lines +67 to +71
const budgetTokens = Math.min(thinking.budget_tokens, maxThinkingBudget)
return Math.max(
budgetTokens,
model.capabilities.supports.min_thinking_budget ?? 1024,
)
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thinking budget calculation could return values that don't satisfy the minimum requirement. When thinking.budget_tokens is less than min_thinking_budget, the Math.max ensures the minimum is met. However, this could exceed maxThinkingBudget if the minimum is larger than the maximum. Consider validating that min_thinking_budget <= maxThinkingBudget before the calculation, or returning undefined if the constraints cannot be satisfied.

Suggested change
const budgetTokens = Math.min(thinking.budget_tokens, maxThinkingBudget)
return Math.max(
budgetTokens,
model.capabilities.supports.min_thinking_budget ?? 1024,
)
const minThinkingBudget =
model.capabilities.supports.min_thinking_budget ?? 1024
// If the minimum required budget exceeds the maximum allowed, the
// constraints cannot be satisfied; fall back to no thinking budget.
if (minThinkingBudget > maxThinkingBudget) {
return undefined
}
const budgetTokens = Math.min(thinking.budget_tokens, maxThinkingBudget)
return Math.max(budgetTokens, minThinkingBudget)

Copilot uses AI. Check for mistakes.
},
)
state.contentBlockIndex++
state.thinkingBlockOpen = false
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thinking block state is set to false without checking if it was actually open. On line 237, state.thinkingBlockOpen is set to false unconditionally, but there's no check ensuring it was true before. This could lead to inconsistent state tracking. Consider only setting it to false if it was previously true, or add assertions to ensure the state transitions are valid.

Suggested change
state.thinkingBlockOpen = false
if (state.thinkingBlockOpen) {
state.thinkingBlockOpen = false
}

Copilot uses AI. Check for mistakes.
Comment on lines +56 to +75
function getThinkingBudget(
payload: AnthropicMessagesPayload,
model: Model | undefined,
): number | undefined {
const thinking = payload.thinking
if (model && thinking) {
const maxThinkingBudget = Math.min(
model.capabilities.supports.max_thinking_budget ?? 0,
(model.capabilities.limits.max_output_tokens ?? 0) - 1,
)
if (maxThinkingBudget > 0 && thinking.budget_tokens !== undefined) {
const budgetTokens = Math.min(thinking.budget_tokens, maxThinkingBudget)
return Math.max(
budgetTokens,
model.capabilities.supports.min_thinking_budget ?? 1024,
)
}
}
return undefined
}
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new thinking budget calculation logic (getThinkingBudget function) and interleaved thinking prompt injection for Claude models lack test coverage. These are significant new features that handle complex logic including min/max budget constraints and model-specific behavior. Consider adding tests that verify: 1) budget calculation with various model capabilities, 2) the system prompt injection for Claude models with thinking enabled, 3) the system-reminder message insertion.

Copilot uses AI. Check for mistakes.
index: state.contentBlockIndex,
})
state.contentBlockIndex++
state.contentBlockOpen = false
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In handleReasoningOpaqueInToolCalls, when closing a content block that is not a tool block, the function doesn't check or update the thinkingBlockOpen state. If a thinking block was open when tool calls arrive, this could lead to inconsistent state where thinkingBlockOpen remains true even though the block has been closed. Consider calling closeThinkingBlockIfOpen before handling reasoning_opaque in tool calls, or explicitly set thinkingBlockOpen to false when appropriate.

Suggested change
state.contentBlockOpen = false
state.contentBlockOpen = false
// Ensure thinking block state is consistent when closing a non-tool content block.
state.thinkingBlockOpen = false

Copilot uses AI. Check for mistakes.
Comment thread src/lib/api-config.ts Outdated
state.accountType === "individual" ?
"https://api.githubcopilot.com"
: `https://api.${state.accountType}.githubcopilot.com`
`https://api.${state.accountType}.githubcopilot.com`
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactored copilotBaseUrl function now unconditionally uses the template https://api.${state.accountType}.githubcopilot.com, which will result in https://api.individual.githubcopilot.com for individual accounts. The previous implementation used https://api.githubcopilot.com (without the subdomain) for individual accounts. This is a breaking change that may cause API requests to fail for individual account users. Please verify that the GitHub Copilot API supports the new URL format for individual accounts, or restore the conditional logic.

Suggested change
`https://api.${state.accountType}.githubcopilot.com`
state.accountType === "individual"
? "https://api.githubcopilot.com"
: `https://api.${state.accountType}.githubcopilot.com`

Copilot uses AI. Check for mistakes.
Comment on lines +200 to +218
let thinkingBlocks = message.content.filter(
(block): block is AnthropicThinkingBlock => block.type === "thinking",
)

// Combine text and thinking blocks, as OpenAI doesn't have separate thinking blocks
const allTextContent = [
...textBlocks.map((b) => b.text),
...thinkingBlocks.map((b) => b.thinking),
].join("\n\n")
if (modelId.startsWith("claude")) {
thinkingBlocks = thinkingBlocks.filter(
(b) =>
b.thinking
&& b.thinking.length > 0
&& b.signature
&& b.signature.length > 0
// gpt signature has @ in it, so filter those out for claude models
&& !b.signature.includes("@"),
)
}

const thinkingContents = thinkingBlocks
.filter((b) => b.thinking && b.thinking.length > 0)
.map((b) => b.thinking)
Copy link

Copilot AI Dec 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thinking blocks are filtered twice - once on line 200-202 to extract all thinking blocks, then again on lines 216-218 to filter those with non-empty thinking content. This redundant filtering is inefficient. Consider combining these filters or restructuring the logic to avoid processing the same blocks multiple times.

Copilot uses AI. Check for mistakes.
@zzb54321
Copy link
Copy Markdown

zzb54321 commented Jan 1, 2026

I tried your branch locally, and encountered the following error.
image

@caozhiyuan
Copy link
Copy Markdown
Contributor Author

@zzb54321 master branch works ok?

@zzb54321
Copy link
Copy Markdown

zzb54321 commented Jan 1, 2026

@zzb54321 master branch works ok?

yeah master is working well

@caozhiyuan
Copy link
Copy Markdown
Contributor Author

@zzb54321 master branch works ok?

yeah master is working well

@zzb54321 you are not individual plan?
you can change this code in api-config.ts

export const copilotBaseUrl = (state: State) =>
  state.accountType === "individual" ?
    "https://api.githubcopilot.com"
  : `https://api.${state.accountType}.githubcopilot.com`

if it works , i will commit code to fix it.

@gonzalez962
Copy link
Copy Markdown

@caozhiyuan Hello. Happy new year... where and how do you get API_VERSION and COPILOT_VERSION ? thanks.

@caozhiyuan
Copy link
Copy Markdown
Contributor Author

@gonzalez962 https://www.npmjs.com/package/@vscode/copilot-api?activeTab=code and https://github.com/microsoft/vscode-copilot-chat/

@zzb54321
Copy link
Copy Markdown

zzb54321 commented Jan 2, 2026

@zzb54321 master branch works ok?

yeah master is working well

@zzb54321 you are not individual plan? you can change this code in api-config.ts

export const copilotBaseUrl = (state: State) =>
  state.accountType === "individual" ?
    "https://api.githubcopilot.com"
  : `https://api.${state.accountType}.githubcopilot.com`

if it works , i will commit code to fix it.

@caozhiyuan It's working now after applying the suggested fix.

BTW, how I can verify gemini-3-pro thinking is working on chat client side? Does it work on the openai-compatible api, or only on the Anthropic format api?

@caozhiyuan
Copy link
Copy Markdown
Contributor Author

caozhiyuan commented Jan 2, 2026

@zzb54321 master branch works ok?

yeah master is working well

@zzb54321 you are not individual plan? you can change this code in api-config.ts

export const copilotBaseUrl = (state: State) =>
  state.accountType === "individual" ?
    "https://api.githubcopilot.com"
  : `https://api.${state.accountType}.githubcopilot.com`

if it works , i will commit code to fix it.

@caozhiyuan It's working now after applying the suggested fix.

BTW, how I can verify gemini-3-pro thinking is working on chat client side? Does it work on the openai-compatible api, or only on the Anthropic format api?

@zzb54321 use message api. It is not a standard OpenAI-compatible protocol . you can use --account-type business if not applying the suggested fix.

caozhiyuan and others added 4 commits January 2, 2026 22:21
When account type is not specified or set to 'individual', use the default
api.githubcopilot.com URL instead of constructing a subdomain-based URL.

This restores previous behavior where business users could work without
explicitly specifying their account type, as the default URL works for both
individual and business accounts.

Only constructs account-type-specific URLs (api.business.githubcopilot.com,
api.enterprise.githubcopilot.com) when those account types are explicitly
specified.
fix: use default API URL when account type is individual
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 13 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +426 to +431
if (reasoningText && reasoningText.length > 0) {
return [
{
type: "thinking",
thinking: reasoningText,
signature: reasoningOpaque || "",
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature field in AnthropicThinkingBlock is now required (line 59 in anthropic-types.ts), but when reasoningOpaque is not provided, it defaults to an empty string (line 431). This is a breaking API change that could affect API consumers. Consider: 1) making the signature field optional to maintain backwards compatibility, 2) documenting that signature can be an empty string and what that means semantically, or 3) only including thinking blocks when both thinking and signature are non-empty to avoid exposing incomplete thinking blocks.

Suggested change
if (reasoningText && reasoningText.length > 0) {
return [
{
type: "thinking",
thinking: reasoningText,
signature: reasoningOpaque || "",
if (
reasoningText &&
reasoningText.length > 0 &&
reasoningOpaque &&
reasoningOpaque.length > 0
) {
return [
{
type: "thinking",
thinking: reasoningText,
signature: reasoningOpaque,

Copilot uses AI. Check for mistakes.
handleReasoningOpaque(choice.delta, events, state)
}
}

Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the handleFinish function, when a finish_reason is received, the code checks if contentBlockOpen is true and closes it, but it doesn't check if thinkingBlockOpen is true. This means if a thinking block is still open when the message finishes (which could happen if reasoning_text arrives without a subsequent reasoning_opaque or content), the thinking block won't be properly closed, leaving the stream in an inconsistent state. Consider adding a check for state.thinkingBlockOpen and calling closeThinkingBlockIfOpen(state, events) before closing the message.

Suggested change
if (state.thinkingBlockOpen) {
closeThinkingBlockIfOpen(state, events)
}

Copilot uses AI. Check for mistakes.
Comment on lines +216 to +222
// handle for claude model
if (
delta.content === ""
&& delta.reasoning_opaque
&& delta.reasoning_opaque.length > 0
&& state.thinkingBlockOpen
) {
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 216 states "handle for claude model", but the code that follows (lines 217-222) doesn't actually check if the model is a Claude model. This logic will execute for any model that sends an empty content string with reasoning_opaque when a thinking block is open. Either add a model check (e.g., checking if the model ID starts with "claude") or update the comment to accurately reflect that this is a general handling for a specific streaming pattern, not Claude-specific behavior.

Copilot uses AI. Check for mistakes.
extraPrompt = `
<interleaved_thinking_protocol>
ABSOLUTE REQUIREMENT - NON-NEGOTIABLE:
The current thinking_mode is interleaved, Whenever you have the result of a function call, think carefully , MUST output a thinking block
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a grammatical issue on line 135: "think carefully ," has an extra space before the comma. It should be "think carefully," without the space.

Suggested change
The current thinking_mode is interleaved, Whenever you have the result of a function call, think carefully , MUST output a thinking block
The current thinking_mode is interleaved, Whenever you have the result of a function call, think carefully, MUST output a thinking block

Copilot uses AI. Check for mistakes.
Comment on lines +325 to +327
delta.content = delta.reasoning_text
delta.reasoning_text = undefined
return
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct mutation of the delta object is problematic here. The function modifies the incoming delta parameter by setting delta.content = delta.reasoning_text and delta.reasoning_text = undefined. This mutates shared state that may be used elsewhere in the call stack, potentially causing unexpected side effects or making debugging difficult. Consider creating a copy of the delta object or handling this case differently without mutation, such as by tracking the state separately or processing the reasoning_text as intended.

Copilot uses AI. Check for mistakes.
Comment on lines +102 to +117
if (modelId.startsWith("claude") && thinkingBudget) {
const reminder =
"<system-reminder>you MUST follow interleaved_thinking_protocol</system-reminder>"
const firstUserIndex = otherMessages.findIndex((m) => m.role === "user")
if (firstUserIndex !== -1) {
const userMessage = otherMessages[firstUserIndex]
if (typeof userMessage.content === "string") {
userMessage.content = reminder + "\n\n" + userMessage.content
} else if (Array.isArray(userMessage.content)) {
userMessage.content = [
{ type: "text", text: reminder },
...userMessage.content,
] as Array<ContentPart>
}
}
}
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The system prompt injection logic (lines 102-117 and 131-143) only activates when thinkingBudget is truthy. However, getThinkingBudget returns undefined in several cases: when model is not found, when payload.thinking is not provided, when thinking.budget_tokens is undefined, or when maxThinkingBudget is 0 or negative. This means the interleaved thinking protocol instructions won't be injected unless all these conditions are met. Consider whether the protocol instructions should be injected whenever payload.thinking exists, regardless of budget calculation success, or document this behavior clearly so users understand when thinking protocol is enabled.

Copilot uses AI. Check for mistakes.

// handle for claude model
if (
delta.content === ""
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition on line 218 checks if delta.content === "", which only matches exactly an empty string. However, this doesn't handle cases where delta.content is null or undefined. If the API can send delta.content as null or undefined along with reasoning_opaque, this condition won't match and the logic won't execute. Consider using !delta.content or explicitly checking for all falsy values: (delta.content === "" || delta.content === null || delta.content === undefined).

Suggested change
delta.content === ""
(delta.content === "" || delta.content == null)

Copilot uses AI. Check for mistakes.
Comment on lines +218 to +232
if (modelId.startsWith("claude")) {
thinkingBlocks = thinkingBlocks.filter(
(b) =>
b.thinking
&& b.thinking.length > 0
&& b.signature
&& b.signature.length > 0
// gpt signature has @ in it, so filter those out for claude models
&& !b.signature.includes("@"),
)
}

const thinkingContents = thinkingBlocks
.filter((b) => b.thinking && b.thinking.length > 0)
.map((b) => b.thinking)
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filtering on line 231 checks if b.thinking && b.thinking.length > 0, which is redundant for Claude models because the same check was already done on lines 221-222. While this doesn't cause incorrect behavior, it adds unnecessary processing. Consider restructuring to avoid double filtering - for example, apply the thinking content filter before the Claude-specific signature filter, or ensure thinking blocks always have valid thinking content when they're created.

Suggested change
if (modelId.startsWith("claude")) {
thinkingBlocks = thinkingBlocks.filter(
(b) =>
b.thinking
&& b.thinking.length > 0
&& b.signature
&& b.signature.length > 0
// gpt signature has @ in it, so filter those out for claude models
&& !b.signature.includes("@"),
)
}
const thinkingContents = thinkingBlocks
.filter((b) => b.thinking && b.thinking.length > 0)
.map((b) => b.thinking)
// First, ensure all thinking blocks have non-empty thinking content
thinkingBlocks = thinkingBlocks.filter(
(b) => b.thinking && b.thinking.length > 0,
)
if (modelId.startsWith("claude")) {
thinkingBlocks = thinkingBlocks.filter(
(b) =>
b.signature
&& b.signature.length > 0
// gpt signature has @ in it, so filter those out for claude models
&& !b.signature.includes("@"),
)
}
const thinkingContents = thinkingBlocks.map((b) => b.thinking)

Copilot uses AI. Check for mistakes.
Comment on lines +56 to +143
function getThinkingBudget(
payload: AnthropicMessagesPayload,
model: Model | undefined,
): number | undefined {
const thinking = payload.thinking
if (model && thinking) {
const maxThinkingBudget = Math.min(
model.capabilities.supports.max_thinking_budget ?? 0,
(model.capabilities.limits.max_output_tokens ?? 0) - 1,
)
if (maxThinkingBudget > 0 && thinking.budget_tokens !== undefined) {
const budgetTokens = Math.min(thinking.budget_tokens, maxThinkingBudget)
return Math.max(
budgetTokens,
model.capabilities.supports.min_thinking_budget ?? 1024,
)
}
}
return undefined
}

function translateModelName(model: string): string {
// Subagent requests use a specific model number which Copilot doesn't support
if (model.startsWith("claude-sonnet-4-")) {
return model.replace(/^claude-sonnet-4-.*/, "claude-sonnet-4")
} else if (model.startsWith("claude-opus-")) {
} else if (model.startsWith("claude-opus-4-")) {
return model.replace(/^claude-opus-4-.*/, "claude-opus-4")
}
return model
}

function translateAnthropicMessagesToOpenAI(
anthropicMessages: Array<AnthropicMessage>,
system: string | Array<AnthropicTextBlock> | undefined,
payload: AnthropicMessagesPayload,
modelId: string,
thinkingBudget: number | undefined,
): Array<Message> {
const systemMessages = handleSystemPrompt(system)

const otherMessages = anthropicMessages.flatMap((message) =>
const systemMessages = handleSystemPrompt(
payload.system,
modelId,
thinkingBudget,
)
const otherMessages = payload.messages.flatMap((message) =>
message.role === "user" ?
handleUserMessage(message)
: handleAssistantMessage(message),
: handleAssistantMessage(message, modelId),
)

if (modelId.startsWith("claude") && thinkingBudget) {
const reminder =
"<system-reminder>you MUST follow interleaved_thinking_protocol</system-reminder>"
const firstUserIndex = otherMessages.findIndex((m) => m.role === "user")
if (firstUserIndex !== -1) {
const userMessage = otherMessages[firstUserIndex]
if (typeof userMessage.content === "string") {
userMessage.content = reminder + "\n\n" + userMessage.content
} else if (Array.isArray(userMessage.content)) {
userMessage.content = [
{ type: "text", text: reminder },
...userMessage.content,
] as Array<ContentPart>
}
}
}
return [...systemMessages, ...otherMessages]
}

function handleSystemPrompt(
system: string | Array<AnthropicTextBlock> | undefined,
modelId: string,
thinkingBudget: number | undefined,
): Array<Message> {
if (!system) {
return []
}

let extraPrompt = ""
if (modelId.startsWith("claude") && thinkingBudget) {
extraPrompt = `
<interleaved_thinking_protocol>
ABSOLUTE REQUIREMENT - NON-NEGOTIABLE:
The current thinking_mode is interleaved, Whenever you have the result of a function call, think carefully , MUST output a thinking block
RULES:
Tool result → thinking block (ALWAYS, no exceptions)
This is NOT optional - it is a hard requirement
The thinking block must contain substantive reasoning (minimum 3-5 sentences)
Think about: what the results mean, what to do next, how to answer the user
NEVER skip this step, even if the result seems simple or obvious
</interleaved_thinking_protocol>`
}
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new thinking budget calculation logic (lines 56-75) and system prompt injection logic (lines 102-117, 131-143) lack test coverage. These are critical features that manipulate model behavior and user inputs. Consider adding tests that verify: 1) thinking budget is correctly calculated when thinking.budget_tokens is provided, 2) thinking budget respects min/max boundaries from model capabilities, 3) system prompt injection happens only for Claude models with thinking budget, 4) the interleaved thinking protocol reminder is correctly prepended to the first user message.

Copilot uses AI. Check for mistakes.
Comment on lines +275 to 375
function handleReasoningOpaque(
delta: Delta,
events: Array<AnthropicStreamEventData>,
state: AnthropicStreamState,
) {
if (delta.reasoning_opaque && delta.reasoning_opaque.length > 0) {
events.push(
{
type: "message_stop",
type: "content_block_start",
index: state.contentBlockIndex,
content_block: {
type: "thinking",
thinking: "",
},
},
{
type: "content_block_delta",
index: state.contentBlockIndex,
delta: {
type: "thinking_delta",
thinking: "",
},
},
{
type: "content_block_delta",
index: state.contentBlockIndex,
delta: {
type: "signature_delta",
signature: delta.reasoning_opaque,
},
},
{
type: "content_block_stop",
index: state.contentBlockIndex,
},
)
state.contentBlockIndex++
}
}

return events
function handleThinkingText(
delta: Delta,
state: AnthropicStreamState,
events: Array<AnthropicStreamEventData>,
) {
if (delta.reasoning_text && delta.reasoning_text.length > 0) {
// compatible with copilot API returning content->reasoning_text->reasoning_opaque in different deltas
// this is an extremely abnormal situation, probably a server-side bug
// only occurs in the claude model, with a very low probability of occurrence
if (state.contentBlockOpen) {
delta.content = delta.reasoning_text
delta.reasoning_text = undefined
return
}

if (!state.thinkingBlockOpen) {
events.push({
type: "content_block_start",
index: state.contentBlockIndex,
content_block: {
type: "thinking",
thinking: "",
},
})
state.thinkingBlockOpen = true
}

events.push({
type: "content_block_delta",
index: state.contentBlockIndex,
delta: {
type: "thinking_delta",
thinking: delta.reasoning_text,
},
})
}
}

function closeThinkingBlockIfOpen(
state: AnthropicStreamState,
events: Array<AnthropicStreamEventData>,
): void {
if (state.thinkingBlockOpen) {
events.push(
{
type: "content_block_delta",
index: state.contentBlockIndex,
delta: {
type: "signature_delta",
signature: "",
},
},
{
type: "content_block_stop",
index: state.contentBlockIndex,
},
)
state.contentBlockIndex++
state.thinkingBlockOpen = false
}
}
Copy link

Copilot AI Jan 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new streaming translation logic for handling thinking blocks and reasoning_opaque (lines 275-313, 315-375) lacks test coverage. This is critical functionality that manages complex state transitions during streaming, including thinking block opening/closing and signature handling. Consider adding tests that verify: 1) reasoning_text is correctly translated to thinking_delta events, 2) reasoning_opaque creates appropriate signature_delta events, 3) thinking blocks are properly closed before content or tool call blocks, 4) the state.thinkingBlockOpen flag is managed correctly throughout the streaming lifecycle.

Copilot uses AI. Check for mistakes.
@caozhiyuan caozhiyuan closed this Jan 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants